155 research outputs found
A complexity analysis of statistical learning algorithms
We apply information-based complexity analysis to support vector machine
(SVM) algorithms, with the goal of a comprehensive continuous algorithmic
analysis of such algorithms. This involves complexity measures in which some
higher order operations (e.g., certain optimizations) are considered primitive
for the purposes of measuring complexity. We consider classes of information
operators and algorithms made up of scaled families, and investigate the
utility of scaling the complexities to minimize error. We look at the division
of statistical learning into information and algorithmic components, at the
complexities of each, and at applications to support vector machine (SVM) and
more general machine learning algorithms. We give applications to SVM
algorithms graded into linear and higher order components, and give an example
in biomedical informatics
On the probabilistic continuous complexity conjecture
In this paper we prove the probabilistic continuous complexity conjecture. In
continuous complexity theory, this states that the complexity of solving a
continuous problem with probability approaching 1 converges (in this limit) to
the complexity of solving the same problem in its worst case. We prove the
conjecture holds if and only if space of problem elements is uniformly convex.
The non-uniformly convex case has a striking counterexample in the problem of
identifying a Brownian path in Wiener space, where it is shown that
probabilistic complexity converges to only half of the worst case complexity in
this limit
On Some Integrated Approaches to Inference
We present arguments for the formulation of unified approach to different
standard continuous inference methods from partial information. It is claimed
that an explicit partition of information into a priori (prior knowledge) and a
posteriori information (data) is an important way of standardizing inference
approaches so that they can be compared on a normative scale, and so that
notions of optimal algorithms become farther-reaching. The inference methods
considered include neural network approaches, information-based complexity, and
Monte Carlo, spline, and regularization methods. The model is an extension of
currently used continuous complexity models, with a class of algorithms in the
form of optimization methods, in which an optimization functional (involving
the data) is minimized. This extends the family of current approaches in
continuous complexity theory, which include the use of interpolatory algorithms
in worst and average case settings
Relationships among Interpolation Bases of Wavelet Spaces and Approximation Spaces
A multiresolution analysis is a nested chain of related approximation
spaces.This nesting in turn implies relationships among interpolation bases in
the approximation spaces and their derived wavelet spaces. Using these
relationships, a necessary and sufficient condition is given for existence of
interpolation wavelets, via analysis of the corresponding scaling functions. It
is also shown that any interpolation function for an approximation space plays
the role of a special type of scaling function (an interpolation scaling
function) when the corresponding family of approximation spaces forms a
multiresolution analysis. Based on these interpolation scaling functions, a new
algorithm is proposed for constructing corresponding interpolation wavelets
(when they exist in a multiresolution analysis). In simulations, our theorems
are tested for several typical wavelet spaces, demonstrating our theorems for
existence of interpolation wavelets and for constructing them in a general
multiresolution analysis
On the average uncertainty for systems with nonlinear coupling
The increased uncertainty and complexity of nonlinear systems have motivated
investigators to consider generalized approaches to defining an entropy
function. New insights are achieved by defining the average uncertainty in the
probability domain as a transformation of entropy functions. The Shannon
entropy when transformed to the probability domain is the weighted geometric
mean of the probabilities. For the exponential and Gaussian distributions, we
show that the weighted geometric mean of the distribution is equal to the
density of the distribution at the location plus the scale, i.e. at the width
of the distribution. The average uncertainty is generalized via the weighted
generalized mean, in which the moment is a function of the nonlinear source.
Both the Renyi and Tsallis entropies transform to this definition of the
generalized average uncertainty in the probability domain. For the generalized
Pareto and Student's t-distributions, which are the maximum entropy
distributions for these generalized entropies, the appropriate weighted
generalized mean also equals the density of the distribution at the location
plus scale. A coupled entropy function is proposed, which is equal to the
normalized Tsallis entropy divided by one plus the coupling.Comment: 24 pages, including 4 figures and 1 tabl
Use of the geometric mean as a statistic for the scale of the coupled Gaussian distributions
The geometric mean is shown to be an appropriate statistic for the scale of a
heavy-tailed coupled Gaussian distribution or equivalently the Student's t
distribution. The coupled Gaussian is a member of a family of distributions
parameterized by the nonlinear statistical coupling which is the reciprocal of
the degree of freedom and is proportional to fluctuations in the inverse scale
of the Gaussian. Existing estimators of the scale of the coupled Gaussian have
relied on estimates of the full distribution, and they suffer from problems
related to outliers in heavy-tailed distributions. In this paper, the scale of
a coupled Gaussian is proven to be equal to the product of the generalized mean
and the square root of the coupling. From our numerical computations of the
scales of coupled Gaussians using the generalized mean of random samples, it is
indicated that only samples from a Cauchy distribution (with coupling parameter
one) form an unbiased estimate with diminishing variance for large samples.
Nevertheless, we also prove that the scale is a function of the geometric mean,
the coupling term and a harmonic number. Numerical experiments show that this
estimator is unbiased with diminishing variance for large samples for a broad
range of coupling values.Comment: 17 pages, 5 figure
Radial bounds for perturbations of elliptic operators
AbstractElliptic operators A = āĀ¦Ī±Ā¦ ā©½ m bĪ±(x) DĪ±, Ī± a multi-index, with leading term positive and constant coefficient, and with lower order coefficients bĪ±(x) Ļµ LrĪ± + LĪ± (with (nrĪ±) + Ā¦Ī±Ā¦ < m) defined on Rn or a quotient space RnRnUĪ±, UĪ±ā Rn are considered. It is shown that the Lp-spectrum of A is contained in a āparabolic regionā Ī© of the complex plane enclosing the positive real axis, uniformly in p. Outside Ī©, the kernel of the resolvent of A is shown to be uniformly bounded by an L1 radial convolution kernel. Some consequences are: A can be closed in all Lp (1 ā©½ p ā©½ ā), and is essentially self-adjoint in L2 if it is symmetric; A generates an analytic semigroup eātA in the right half plane, strongly Lp and pointwise continuous at t = 0. A priori estimates relating the leading term and remainder are obtained, and summability Ļ(ĪµA)ĘāĪµ ā 0Ļ(0) Ę, with Ļ analytic, is proved for Ę Ļµ Lp, with convergence in Lp and on the Lebesgue set of Ę. More comprehensive summability results are obtained when A has constant coefficients
Transcription factor-DNA binding via machine learning ensembles
The network of interactions between transcription factors (TFs) and their regulatory gene targets governs many of the behaviors and responses of cells. Construction of a transcriptional regulatory network involves three interrelated problems, defined for any regulator: finding (1) its target genes, (2) its binding motif and (3) its DNA binding sites. Many tools have been developed in the last decade to solve these problems. However, performance of algorithms for these has not been consistent for all transcription factors. Because machine learning algorithms have shown advantages in integrating information of different types, we investigate a machine-based approach to integrating predictions from an ensemble of commonly used motif exploration algorithms.Published versio
Bioinformatics and Biomedical Informatics
Published versio
- ā¦